Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inference_metadata_fields] Clear inference results on explicit nulls #119145

Open
wants to merge 10 commits into
base: inference_metadata_fields
Choose a base branch
from

Conversation

Mikep86
Copy link
Contributor

@Mikep86 Mikep86 commented Dec 19, 2024

Fix a bug where setting a semantic_text source field explicitly to null in an update request to clear inference results did not actually clear the inference results for that field. This bug only affects the new _inference_fields format.

jimczi and others added 7 commits December 18, 2024 12:46
Previously, setting a field explicitly to null in an update request did not work correctly with semantic text fields.
This change resolves the issue by adding an explicit null entry to the `_inference_fields` metadata when such cases occur.

The explicit null value ensures that any prior inference results are overwritten during the merge of the partial update with the latest document version.
@Mikep86 Mikep86 added >non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.0.0 labels Dec 19, 2024
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored into two YAML tests so that we can focus on the explicit nulls in one of them

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Overall looks good.

// ensure that the order in the original field is consistent in case of multiple inputs
Collections.sort(responses, Comparator.comparingInt(FieldInferenceResponse::inputOrder));
Map<String, List<SemanticTextField.Chunk>> chunkMap = new LinkedHashMap<>();
for (var resp : responses) {
// Get the first non-null model from the response list
if (model == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do any additional validation here, to verify that model if it exists is compatible with resp.model?

var valueObj = XContentMapValues.extractValue(sourceField, docMap, EXPLICIT_NULL);
if (useLegacyFormat == false && isUpdateRequest && valueObj == EXPLICIT_NULL) {
/**
* It's an update request, and the source field is explicitly set to null,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment here!

*/
var slot = ensureResponseAccumulatorSlot(itemIndex);
slot.addOrUpdateResponse(
new FieldInferenceResponse(field, sourceField, null, order++, 0, null, EMPTY_CHUNKED_INFERENCE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're incrementing order here and we're still incrementing it later on on line 566 in existing code. Do we need to reset the value of order before we iterate through values here? It's a bit confusing on read through.

- match: { hits.total.value: 1 }
- match: { hits.total.relation: eq }

- length: { hits.hits.0._source._inference_fields.sparse_field.inference.chunks: 1 }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we check the offsets here but can we check the embeddings as well? (Verify the right chunk was removed, in a more visual manner?)

@Mikep86 Mikep86 marked this pull request as ready for review December 23, 2024 21:56
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>non-issue :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants